GUBS: a Utility-Based Semantic for Goal-Directed Markov Decision Processes
نویسندگان
چکیده
A key question in stochastic planning is how to evaluate policies when a goal cannot be reached with probability one. Usually, in the Goal-Directed Markov Decision Process (GDMDP) formalization, the outcome of a policy is summarized on two criteria: probability of reaching a goal state and expected cost to reach a goal state. The dual criterion solution considers a lexicography preference, by prioritizing probability of reaching the goal state and then minimizing the expected cost to goal. Some other solutions, consider only cost by using some math trick to guaranteed that every policy has a finite expected cost. In this paper we show that the lexicography solution does not allow a smooth trade-off between goal and cost, while the expected cost solution does not define a goal-semantic. We propose GUBS (Goals with Utility-Based Semantic), a new model to evaluate policies based on the expected utility theory; this model defines a trade-off between cost and goal by proposing an axiomatization for goal semantics in GD-MDPs. We show that our model can be solved by any continuous state MDP solver and propose an algorithm to solve a special class of GD-MDPs.
منابع مشابه
Utilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs
Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملFast Value Iteration for Goal-Directed Markov Decision Processes
P lanning problems where effects of actions are non-deterministic can be modeled a8 Markov decision processes. Planning prob lems are usually goal-directed. This paper proposes several techniques for exploiting the goal-directedness to accelerate value itera tion, a standard algorithm for solving Markov decision processes. Empirical studies have shown that the techniques can bring about signi...
متن کاملDecision-Theoretic Subgoaling for Planning with External Events
I describe a planning methodology for domains with uncertainty in the form of external events that are not completely predictable. Under certain conditions, these events can be modelled as continuous-time Markov chains whose states are characterised by the planner’s domain predicates. Planning is goal-directed, but the subgoals are suggested by analysing the utility of the partial plan rather t...
متن کاملRepresentations of Decision-Theoretic Planning Tasks
Goal-directed Markov Decision Process models (GDMDPs) are good models for many decision-theoretic planning tasks. They have been used in conjunction with two different reward structures, namely the goal-reward representation and the action-penalty representation. We apply GDMDPs to planning tasks in the presence of traps such as steep slopes for outdoor robots or staircases for indoor robots, a...
متن کامل